Overview

Dataset statistics

Number of variables12
Number of observations103032
Missing cells0
Missing cells (%)0.0%
Duplicate rows122
Duplicate rows (%)0.1%
Total size in memory9.4 MiB
Average record size in memory96.0 B

Variable types

Numeric3
Categorical9

Alerts

Dataset has 122 (0.1%) duplicate rowsDuplicates
marca has a high cardinality: 54 distinct valuesHigh cardinality
codigo_irs has a high cardinality: 1665 distinct valuesHigh cardinality
nombre has a high cardinality: 1741 distinct valuesHigh cardinality
marca.1 has a high cardinality: 54 distinct valuesHigh cardinality
linea has a high cardinality: 862 distinct valuesHigh cardinality
numero_aviso is highly overall correlated with marca and 5 other fieldsHigh correlation
grupo is highly overall correlated with subgrupoHigh correlation
subgrupo is highly overall correlated with grupoHigh correlation
marca is highly overall correlated with numero_aviso and 1 other fieldsHigh correlation
accion is highly overall correlated with numero_avisoHigh correlation
accion_modelo is highly overall correlated with numero_avisoHigh correlation
marca.1 is highly overall correlated with numero_aviso and 1 other fieldsHigh correlation
tipo_carroceria is highly overall correlated with numero_avisoHigh correlation
anio_range is highly overall correlated with numero_avisoHigh correlation

Reproduction

Analysis started2023-05-29 19:07:34.682529
Analysis finished2023-05-29 19:09:36.030509
Duration2 minutes and 1.35 second
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

numero_aviso
Real number (ℝ)

Distinct8708
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44333.365
Minimum8437
Maximum67229
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size805.1 KiB
2023-05-29T14:09:36.125789image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum8437
5-th percentile16128
Q129917
median47329
Q358702
95-th percentile65938
Maximum67229
Range58792
Interquartile range (IQR)28785

Descriptive statistics

Standard deviation16371.134
Coefficient of variation (CV)0.36927342
Kurtosis-1.1547681
Mean44333.365
Median Absolute Deviation (MAD)13622
Skewness-0.3645402
Sum4.5677553 × 109
Variance2.6801401 × 108
MonotonicityNot monotonic
2023-05-29T14:09:36.266480image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50067 111
 
0.1%
66892 105
 
0.1%
24572 104
 
0.1%
50602 95
 
0.1%
27170 94
 
0.1%
61699 93
 
0.1%
17279 93
 
0.1%
22515 92
 
0.1%
62289 92
 
0.1%
56201 92
 
0.1%
Other values (8698) 102061
99.1%
ValueCountFrequency (%)
8437 2
 
< 0.1%
9427 2
 
< 0.1%
9928 3
< 0.1%
10101 7
< 0.1%
10275 4
< 0.1%
10508 4
< 0.1%
10708 1
 
< 0.1%
11198 3
< 0.1%
11199 3
< 0.1%
11200 3
< 0.1%
ValueCountFrequency (%)
67229 17
< 0.1%
67226 10
 
< 0.1%
67217 35
< 0.1%
67212 2
 
< 0.1%
67208 11
 
< 0.1%
67204 24
< 0.1%
67203 10
 
< 0.1%
67202 3
 
< 0.1%
67200 18
< 0.1%
67198 3
 
< 0.1%

marca
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct54
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
HYUNDAI
22454 
TOYOTA
19608 
KIA
17090 
NISSAN
13499 
SUZUKI
7722 
Other values (49)
22659 

Length

Max length13
Median length12
Mean length5.8168239
Min length2

Characters and Unicode

Total characters599319
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowHYUNDAI
2nd rowHYUNDAI
3rd rowNISSAN
4th rowKIA
5th rowHONDA

Common Values

ValueCountFrequency (%)
HYUNDAI 22454
21.8%
TOYOTA 19608
19.0%
KIA 17090
16.6%
NISSAN 13499
13.1%
SUZUKI 7722
 
7.5%
HONDA 5287
 
5.1%
MITSUBISHI 3336
 
3.2%
CHEVROLET 2555
 
2.5%
MAZDA 2484
 
2.4%
FORD 1935
 
1.9%
Other values (44) 7062
 
6.9%

Length

2023-05-29T14:09:36.399859image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hyundai 22454
21.7%
toyota 19608
19.0%
kia 17090
16.6%
nissan 13499
13.1%
suzuki 7729
 
7.5%
honda 5287
 
5.1%
mitsubishi 3336
 
3.2%
chevrolet 2555
 
2.5%
mazda 2484
 
2.4%
ford 1935
 
1.9%
Other values (45) 7274
 
7.0%

Most occurring characters

ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (18) 84840
14.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 598538
99.9%
Dash Punctuation 555
 
0.1%
Space Separator 219
 
< 0.1%
Other Punctuation 7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (15) 84059
14.0%
Dash Punctuation
ValueCountFrequency (%)
- 555
100.0%
Space Separator
ValueCountFrequency (%)
219
100.0%
Other Punctuation
ValueCountFrequency (%)
% 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 598538
99.9%
Common 781
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (15) 84059
14.0%
Common
ValueCountFrequency (%)
- 555
71.1%
219
 
28.0%
% 7
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 599319
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (18) 84840
14.2%

codigo_irs
Categorical

Distinct1665
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
02090112nndn
 
3134
02090113nntn
 
2194
02090066nddn
 
2145
02090200nndn
 
2111
02090067nidn
 
2077
Other values (1660)
91371 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1236384
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique234 ?
Unique (%)0.2%

Sample

1st row03060003nndn
2nd row03030005nndn
3rd row02090225nidn
4th row03040016nidn
5th row02090021nddn

Common Values

ValueCountFrequency (%)
02090112nndn 3134
 
3.0%
02090113nntn 2194
 
2.1%
02090066nddn 2145
 
2.1%
02090200nndn 2111
 
2.0%
02090067nidn 2077
 
2.0%
02090129nndn 1449
 
1.4%
02090143nddn 1416
 
1.4%
03040015nddn 1399
 
1.4%
03040016nidn 1369
 
1.3%
02090144nidn 1347
 
1.3%
Other values (1655) 84391
81.9%

Length

2023-05-29T14:09:36.507174image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
02090112nndn 3134
 
3.0%
02090113nntn 2194
 
2.1%
02090066nddn 2145
 
2.1%
02090200nndn 2111
 
2.0%
02090067nidn 2077
 
2.0%
02090129nndn 1449
 
1.4%
02090143nddn 1416
 
1.4%
03040015nddn 1399
 
1.4%
03040016nidn 1369
 
1.3%
02090144nidn 1347
 
1.3%
Other values (1655) 84391
81.9%

Most occurring characters

ValueCountFrequency (%)
0 396616
32.1%
n 244841
19.8%
2 110681
 
9.0%
d 97988
 
7.9%
1 72285
 
5.8%
9 69551
 
5.6%
i 38393
 
3.1%
5 36894
 
3.0%
4 36581
 
3.0%
3 35123
 
2.8%
Other values (6) 97431
 
7.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 824256
66.7%
Lowercase Letter 412128
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 396616
48.1%
2 110681
 
13.4%
1 72285
 
8.8%
9 69551
 
8.4%
5 36894
 
4.5%
4 36581
 
4.4%
3 35123
 
4.3%
6 29881
 
3.6%
7 19917
 
2.4%
8 16727
 
2.0%
Lowercase Letter
ValueCountFrequency (%)
n 244841
59.4%
d 97988
23.8%
i 38393
 
9.3%
t 25627
 
6.2%
e 3684
 
0.9%
s 1595
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common 824256
66.7%
Latin 412128
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 396616
48.1%
2 110681
 
13.4%
1 72285
 
8.8%
9 69551
 
8.4%
5 36894
 
4.5%
4 36581
 
4.4%
3 35123
 
4.3%
6 29881
 
3.6%
7 19917
 
2.4%
8 16727
 
2.0%
Latin
ValueCountFrequency (%)
n 244841
59.4%
d 97988
23.8%
i 38393
 
9.3%
t 25627
 
6.2%
e 3684
 
0.9%
s 1595
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1236384
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 396616
32.1%
n 244841
19.8%
2 110681
 
9.0%
d 97988
 
7.9%
1 72285
 
5.8%
9 69551
 
5.6%
i 38393
 
3.1%
5 36894
 
3.0%
4 36581
 
3.0%
3 35123
 
2.8%
Other values (6) 97431
 
7.9%

nombre
Categorical

Distinct1741
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
defensa delantero
 
3134
defensa trasero
 
2194
guardafango derecho
 
2145
tapa motor
 
2111
guardafango izquierdo
 
2077
Other values (1736)
91371 

Length

Max length65
Median length51
Mean length26.0244
Min length5

Characters and Unicode

Total characters2681346
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique256 ?
Unique (%)0.2%

Sample

1st rowcomputadora
2nd rowcable ground de bateria
3rd rowpollera delantero izquierdo
4th rowlampara delantero izquierdo
5th rowbisagra derecho tapa motor

Common Values

ValueCountFrequency (%)
defensa delantero 3134
 
3.0%
defensa trasero 2194
 
2.1%
guardafango derecho 2145
 
2.1%
tapa motor 2111
 
2.0%
guardafango izquierdo 2077
 
2.0%
parrilla 1449
 
1.4%
puerta delantero derecho 1416
 
1.4%
lampara delantero derecho 1399
 
1.4%
lampara delantero izquierdo 1369
 
1.3%
puerta delantero izquierdo 1347
 
1.3%
Other values (1731) 84391
81.9%

Length

2023-05-29T14:09:36.646026image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
delantero 39501
 
11.4%
derecho 31004
 
8.9%
izquierdo 29473
 
8.5%
trasero 21497
 
6.2%
defensa 20454
 
5.9%
de 10451
 
3.0%
puerta 9277
 
2.7%
guardafango 9249
 
2.7%
tapa 8639
 
2.5%
motor 7180
 
2.1%
Other values (637) 159786
46.1%

Most occurring characters

ValueCountFrequency (%)
e 329560
12.3%
a 296092
11.0%
r 288022
10.7%
o 245549
9.2%
243682
9.1%
d 171769
 
6.4%
i 143239
 
5.3%
t 136085
 
5.1%
l 118136
 
4.4%
n 115052
 
4.3%
Other values (27) 594160
22.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2436739
90.9%
Space Separator 243682
 
9.1%
Decimal Number 616
 
< 0.1%
Other Punctuation 273
 
< 0.1%
Dash Punctuation 36
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 329560
13.5%
a 296092
12.2%
r 288022
11.8%
o 245549
10.1%
d 171769
 
7.0%
i 143239
 
5.9%
t 136085
 
5.6%
l 118136
 
4.8%
n 115052
 
4.7%
u 87953
 
3.6%
Other values (17) 505282
20.7%
Decimal Number
ValueCountFrequency (%)
1 254
41.2%
4 194
31.5%
6 97
 
15.7%
3 41
 
6.7%
7 15
 
2.4%
8 15
 
2.4%
Other Punctuation
ValueCountFrequency (%)
/ 220
80.6%
. 53
 
19.4%
Space Separator
ValueCountFrequency (%)
243682
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2436739
90.9%
Common 244607
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 329560
13.5%
a 296092
12.2%
r 288022
11.8%
o 245549
10.1%
d 171769
 
7.0%
i 143239
 
5.9%
t 136085
 
5.6%
l 118136
 
4.8%
n 115052
 
4.7%
u 87953
 
3.6%
Other values (17) 505282
20.7%
Common
ValueCountFrequency (%)
243682
99.6%
1 254
 
0.1%
/ 220
 
0.1%
4 194
 
0.1%
6 97
 
< 0.1%
. 53
 
< 0.1%
3 41
 
< 0.1%
- 36
 
< 0.1%
7 15
 
< 0.1%
8 15
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2680530
> 99.9%
None 816
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 329560
12.3%
a 296092
11.0%
r 288022
10.7%
o 245549
9.2%
243682
9.1%
d 171769
 
6.4%
i 143239
 
5.3%
t 136085
 
5.1%
l 118136
 
4.4%
n 115052
 
4.3%
Other values (26) 593344
22.1%
None
ValueCountFrequency (%)
ñ 816
100.0%

accion
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
cambiar
75598 
reparar
27434 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters721224
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcambiar
2nd rowcambiar
3rd rowcambiar
4th rowcambiar
5th rowcambiar

Common Values

ValueCountFrequency (%)
cambiar 75598
73.4%
reparar 27434
 
26.6%

Length

2023-05-29T14:09:36.776403image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-29T14:09:36.888085image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
cambiar 75598
73.4%
reparar 27434
 
26.6%

Most occurring characters

ValueCountFrequency (%)
a 206064
28.6%
r 157900
21.9%
c 75598
 
10.5%
m 75598
 
10.5%
b 75598
 
10.5%
i 75598
 
10.5%
e 27434
 
3.8%
p 27434
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 721224
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 206064
28.6%
r 157900
21.9%
c 75598
 
10.5%
m 75598
 
10.5%
b 75598
 
10.5%
i 75598
 
10.5%
e 27434
 
3.8%
p 27434
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 721224
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 206064
28.6%
r 157900
21.9%
c 75598
 
10.5%
m 75598
 
10.5%
b 75598
 
10.5%
i 75598
 
10.5%
e 27434
 
3.8%
p 27434
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 721224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 206064
28.6%
r 157900
21.9%
c 75598
 
10.5%
m 75598
 
10.5%
b 75598
 
10.5%
i 75598
 
10.5%
e 27434
 
3.8%
p 27434
 
3.8%

accion_modelo
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
cambiar
67025 
reparar
36007 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters721224
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcambiar
2nd rowcambiar
3rd rowcambiar
4th rowreparar
5th rowcambiar

Common Values

ValueCountFrequency (%)
cambiar 67025
65.1%
reparar 36007
34.9%

Length

2023-05-29T14:09:36.980951image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-29T14:09:37.089108image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
cambiar 67025
65.1%
reparar 36007
34.9%

Most occurring characters

ValueCountFrequency (%)
a 206064
28.6%
r 175046
24.3%
c 67025
 
9.3%
m 67025
 
9.3%
b 67025
 
9.3%
i 67025
 
9.3%
e 36007
 
5.0%
p 36007
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 721224
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 206064
28.6%
r 175046
24.3%
c 67025
 
9.3%
m 67025
 
9.3%
b 67025
 
9.3%
i 67025
 
9.3%
e 36007
 
5.0%
p 36007
 
5.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 721224
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 206064
28.6%
r 175046
24.3%
c 67025
 
9.3%
m 67025
 
9.3%
b 67025
 
9.3%
i 67025
 
9.3%
e 36007
 
5.0%
p 36007
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 721224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 206064
28.6%
r 175046
24.3%
c 67025
 
9.3%
m 67025
 
9.3%
b 67025
 
9.3%
i 67025
 
9.3%
e 36007
 
5.0%
p 36007
 
5.0%

marca.1
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct54
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
HYUNDAI
22454 
TOYOTA
19608 
KIA
17090 
NISSAN
13499 
SUZUKI
7722 
Other values (49)
22659 

Length

Max length13
Median length12
Mean length5.8168239
Min length2

Characters and Unicode

Total characters599319
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowHYUNDAI
2nd rowHYUNDAI
3rd rowNISSAN
4th rowKIA
5th rowHONDA

Common Values

ValueCountFrequency (%)
HYUNDAI 22454
21.8%
TOYOTA 19608
19.0%
KIA 17090
16.6%
NISSAN 13499
13.1%
SUZUKI 7722
 
7.5%
HONDA 5287
 
5.1%
MITSUBISHI 3336
 
3.2%
CHEVROLET 2555
 
2.5%
MAZDA 2484
 
2.4%
FORD 1935
 
1.9%
Other values (44) 7062
 
6.9%

Length

2023-05-29T14:09:37.194516image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hyundai 22454
21.7%
toyota 19608
19.0%
kia 17090
16.6%
nissan 13499
13.1%
suzuki 7729
 
7.5%
honda 5287
 
5.1%
mitsubishi 3336
 
3.2%
chevrolet 2555
 
2.5%
mazda 2484
 
2.4%
ford 1935
 
1.9%
Other values (45) 7274
 
7.0%

Most occurring characters

ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (18) 84840
14.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 598538
99.9%
Dash Punctuation 555
 
0.1%
Space Separator 219
 
< 0.1%
Other Punctuation 7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (15) 84059
14.0%
Dash Punctuation
ValueCountFrequency (%)
- 555
100.0%
Space Separator
ValueCountFrequency (%)
219
100.0%
Other Punctuation
ValueCountFrequency (%)
% 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 598538
99.9%
Common 781
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (15) 84059
14.0%
Common
ValueCountFrequency (%)
- 555
71.1%
219
 
28.0%
% 7
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 599319
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 85671
14.3%
I 73721
12.3%
N 57128
9.5%
O 50312
8.4%
U 46719
7.8%
T 45549
7.6%
S 45377
7.6%
Y 42161
7.0%
H 34027
 
5.7%
D 33814
 
5.6%
Other values (18) 84840
14.2%

linea
Categorical

Distinct862
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
RIO
 
3810
HILUX
 
3790
ACCENT
 
3277
hilux
 
2537
VERSA
 
2516
Other values (857)
87102 

Length

Max length36
Median length28
Mean length6.7087604
Min length1

Characters and Unicode

Total characters691217
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)< 0.1%

Sample

1st rowaccent
2nd rowaccent
3rd rowFRONTIER
4th rowrio
5th rowCRV

Common Values

ValueCountFrequency (%)
RIO 3810
 
3.7%
HILUX 3790
 
3.7%
ACCENT 3277
 
3.2%
hilux 2537
 
2.5%
VERSA 2516
 
2.4%
rio 2489
 
2.4%
YARIS 2084
 
2.0%
accent 2049
 
2.0%
TUCSON 1945
 
1.9%
SPORTAGE 1710
 
1.7%
Other values (852) 76825
74.6%

Length

2023-05-29T14:09:37.335480image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
accent 9214
 
7.0%
rio 7776
 
5.9%
hilux 6860
 
5.2%
yaris 4400
 
3.4%
versa 4098
 
3.1%
tucson 3545
 
2.7%
sedan 3443
 
2.6%
frontier 3036
 
2.3%
crv 2686
 
2.0%
vitara 2339
 
1.8%
Other values (590) 83906
63.9%

Most occurring characters

ValueCountFrequency (%)
A 53804
 
7.8%
R 43402
 
6.3%
E 35957
 
5.2%
I 35686
 
5.2%
C 33235
 
4.8%
T 31529
 
4.6%
28670
 
4.1%
N 28646
 
4.1%
S 28476
 
4.1%
O 28306
 
4.1%
Other values (59) 343506
49.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 434495
62.9%
Lowercase Letter 183413
26.5%
Decimal Number 38679
 
5.6%
Space Separator 28670
 
4.1%
Dash Punctuation 3362
 
0.5%
Open Punctuation 1154
 
0.2%
Close Punctuation 1154
 
0.2%
Other Punctuation 281
 
< 0.1%
Connector Punctuation 9
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 53804
12.4%
R 43402
10.0%
E 35957
 
8.3%
I 35686
 
8.2%
C 33235
 
7.6%
T 31529
 
7.3%
N 28646
 
6.6%
S 28476
 
6.6%
O 28306
 
6.5%
L 20019
 
4.6%
Other values (16) 95435
22.0%
Lowercase Letter
ValueCountFrequency (%)
r 21150
11.5%
a 20814
11.3%
i 15925
8.7%
c 14513
 
7.9%
e 14247
 
7.8%
t 13672
 
7.5%
o 13109
 
7.1%
n 11460
 
6.2%
s 11172
 
6.1%
l 9805
 
5.3%
Other values (16) 37546
20.5%
Decimal Number
ValueCountFrequency (%)
0 14717
38.0%
2 4891
 
12.6%
3 4557
 
11.8%
5 4140
 
10.7%
1 3791
 
9.8%
4 3307
 
8.5%
7 1236
 
3.2%
6 1126
 
2.9%
9 466
 
1.2%
8 448
 
1.2%
Other Punctuation
ValueCountFrequency (%)
. 277
98.6%
, 4
 
1.4%
Space Separator
ValueCountFrequency (%)
28670
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3362
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 1154
100.0%
Close Punctuation
ValueCountFrequency (%)
] 1154
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 617908
89.4%
Common 73309
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 53804
 
8.7%
R 43402
 
7.0%
E 35957
 
5.8%
I 35686
 
5.8%
C 33235
 
5.4%
T 31529
 
5.1%
N 28646
 
4.6%
S 28476
 
4.6%
O 28306
 
4.6%
r 21150
 
3.4%
Other values (42) 277717
44.9%
Common
ValueCountFrequency (%)
28670
39.1%
0 14717
20.1%
2 4891
 
6.7%
3 4557
 
6.2%
5 4140
 
5.6%
1 3791
 
5.2%
- 3362
 
4.6%
4 3307
 
4.5%
7 1236
 
1.7%
[ 1154
 
1.6%
Other values (7) 3484
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 691217
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 53804
 
7.8%
R 43402
 
6.3%
E 35957
 
5.2%
I 35686
 
5.2%
C 33235
 
4.8%
T 31529
 
4.6%
28670
 
4.1%
N 28646
 
4.1%
S 28476
 
4.1%
O 28306
 
4.1%
Other values (59) 343506
49.7%

grupo
Real number (ℝ)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5924179
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size805.1 KiB
2023-05-29T14:09:37.456187image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q12
median2
Q33
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1421983
Coefficient of variation (CV)0.4405919
Kurtosis1.2228235
Mean2.5924179
Median Absolute Deviation (MAD)0
Skewness1.6772637
Sum267102
Variance1.304617
MonotonicityNot monotonic
2023-05-29T14:09:37.553625image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 76505
74.3%
5 14054
 
13.6%
3 9241
 
9.0%
6 1875
 
1.8%
4 1164
 
1.1%
1 193
 
0.2%
ValueCountFrequency (%)
1 193
 
0.2%
2 76505
74.3%
3 9241
 
9.0%
4 1164
 
1.1%
5 14054
 
13.6%
6 1875
 
1.8%
ValueCountFrequency (%)
6 1875
 
1.8%
5 14054
 
13.6%
4 1164
 
1.1%
3 9241
 
9.0%
2 76505
74.3%
1 193
 
0.2%

subgrupo
Real number (ℝ)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.4310894
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size805.1 KiB
2023-05-29T14:09:37.651557image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q15
median9
Q39
95-th percentile9
Maximum11
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.460048
Coefficient of variation (CV)0.3310481
Kurtosis-0.37356842
Mean7.4310894
Median Absolute Deviation (MAD)0
Skewness-1.0028082
Sum765640
Variance6.0518362
MonotonicityNot monotonic
2023-05-29T14:09:37.753275image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
9 58282
56.6%
4 12494
 
12.1%
8 6658
 
6.5%
6 4542
 
4.4%
5 4399
 
4.3%
3 4270
 
4.1%
2 3891
 
3.8%
7 2677
 
2.6%
11 2362
 
2.3%
10 2205
 
2.1%
ValueCountFrequency (%)
1 1252
 
1.2%
2 3891
 
3.8%
3 4270
 
4.1%
4 12494
 
12.1%
5 4399
 
4.3%
6 4542
 
4.4%
7 2677
 
2.6%
8 6658
 
6.5%
9 58282
56.6%
10 2205
 
2.1%
ValueCountFrequency (%)
11 2362
 
2.3%
10 2205
 
2.1%
9 58282
56.6%
8 6658
 
6.5%
7 2677
 
2.6%
6 4542
 
4.4%
5 4399
 
4.3%
4 12494
 
12.1%
3 4270
 
4.1%
2 3891
 
3.8%

tipo_carroceria
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
sedan
46705 
camioneta
34963 
pickup
13895 
coupe
 
4631
utilitario
 
2838

Length

Max length10
Median length9
Mean length6.6299499
Min length5

Characters and Unicode

Total characters683097
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsedan
2nd rowsedan
3rd rowpickup
4th rowsedan
5th rowcamioneta

Common Values

ValueCountFrequency (%)
sedan 46705
45.3%
camioneta 34963
33.9%
pickup 13895
 
13.5%
coupe 4631
 
4.5%
utilitario 2838
 
2.8%

Length

2023-05-29T14:09:37.868017image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-29T14:09:37.990924image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
sedan 46705
45.3%
camioneta 34963
33.9%
pickup 13895
 
13.5%
coupe 4631
 
4.5%
utilitario 2838
 
2.8%

Most occurring characters

ValueCountFrequency (%)
a 119469
17.5%
e 86299
12.6%
n 81668
12.0%
i 57372
8.4%
c 53489
7.8%
s 46705
 
6.8%
d 46705
 
6.8%
o 42432
 
6.2%
t 40639
 
5.9%
m 34963
 
5.1%
Other values (5) 73356
10.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 683097
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 119469
17.5%
e 86299
12.6%
n 81668
12.0%
i 57372
8.4%
c 53489
7.8%
s 46705
 
6.8%
d 46705
 
6.8%
o 42432
 
6.2%
t 40639
 
5.9%
m 34963
 
5.1%
Other values (5) 73356
10.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 683097
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 119469
17.5%
e 86299
12.6%
n 81668
12.0%
i 57372
8.4%
c 53489
7.8%
s 46705
 
6.8%
d 46705
 
6.8%
o 42432
 
6.2%
t 40639
 
5.9%
m 34963
 
5.1%
Other values (5) 73356
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 683097
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 119469
17.5%
e 86299
12.6%
n 81668
12.0%
i 57372
8.4%
c 53489
7.8%
s 46705
 
6.8%
d 46705
 
6.8%
o 42432
 
6.2%
t 40639
 
5.9%
m 34963
 
5.1%
Other values (5) 73356
10.7%

anio_range
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size805.1 KiB
2020 - 2017
48783 
2016 - 2013
26062 
2024 - 2021
22646 
2012 - 2009
 
4309
2008 - 2005
 
704

Length

Max length12
Median length11
Mean length11.005125
Min length11

Characters and Unicode

Total characters1133880
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2012 - 2009
2nd row2012 - 2009
3rd row2016 - 2013
4th row2020 - 2017
5th row2016 - 2013

Common Values

ValueCountFrequency (%)
2020 - 2017 48783
47.3%
2016 - 2013 26062
25.3%
2024 - 2021 22646
22.0%
2012 - 2009 4309
 
4.2%
2008 - 2005 704
 
0.7%
2004 - atrás 528
 
0.5%

Length

2023-05-29T14:09:38.098307image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-29T14:09:38.229707image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
103032
33.3%
2020 48783
15.8%
2017 48783
15.8%
2016 26062
 
8.4%
2013 26062
 
8.4%
2024 22646
 
7.3%
2021 22646
 
7.3%
2012 4309
 
1.4%
2009 4309
 
1.4%
2008 704
 
0.2%
Other values (3) 1760
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2 303920
26.8%
0 260564
23.0%
206064
18.2%
1 127862
11.3%
- 103032
 
9.1%
7 48783
 
4.3%
6 26062
 
2.3%
3 26062
 
2.3%
4 23174
 
2.0%
9 4309
 
0.4%
Other values (7) 4048
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 822144
72.5%
Space Separator 206064
 
18.2%
Dash Punctuation 103032
 
9.1%
Lowercase Letter 2640
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 303920
37.0%
0 260564
31.7%
1 127862
15.6%
7 48783
 
5.9%
6 26062
 
3.2%
3 26062
 
3.2%
4 23174
 
2.8%
9 4309
 
0.5%
8 704
 
0.1%
5 704
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
a 528
20.0%
t 528
20.0%
r 528
20.0%
á 528
20.0%
s 528
20.0%
Space Separator
ValueCountFrequency (%)
206064
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 103032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1131240
99.8%
Latin 2640
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
2 303920
26.9%
0 260564
23.0%
206064
18.2%
1 127862
11.3%
- 103032
 
9.1%
7 48783
 
4.3%
6 26062
 
2.3%
3 26062
 
2.3%
4 23174
 
2.0%
9 4309
 
0.4%
Other values (2) 1408
 
0.1%
Latin
ValueCountFrequency (%)
a 528
20.0%
t 528
20.0%
r 528
20.0%
á 528
20.0%
s 528
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1133352
> 99.9%
None 528
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 303920
26.8%
0 260564
23.0%
206064
18.2%
1 127862
11.3%
- 103032
 
9.1%
7 48783
 
4.3%
6 26062
 
2.3%
3 26062
 
2.3%
4 23174
 
2.0%
9 4309
 
0.4%
Other values (6) 3520
 
0.3%
None
ValueCountFrequency (%)
á 528
100.0%

Interactions

2023-05-29T14:09:12.445758image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:07:45.162057image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:08:50.037419image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:09:34.795188image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:08:21.163506image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:09:12.054048image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:09:34.975069image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:08:35.716601image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-29T14:09:12.235446image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-05-29T14:09:38.335762image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
numero_avisogruposubgrupomarcaaccionaccion_modelomarca.1tipo_carroceriaanio_range
numero_aviso1.0000.025-0.0470.9570.5520.8120.9570.9570.957
grupo0.0251.000-0.6010.0340.2820.1420.0340.0290.012
subgrupo-0.047-0.6011.0000.0370.4020.1940.0370.0620.023
marca0.9570.0340.0371.0000.0690.0871.0000.3320.168
accion0.5520.2820.4020.0691.0000.2180.0690.0330.034
accion_modelo0.8120.1420.1940.0870.2181.0000.0870.0370.038
marca.10.9570.0340.0371.0000.0690.0871.0000.3320.168
tipo_carroceria0.9570.0290.0620.3320.0330.0370.3321.0000.056
anio_range0.9570.0120.0230.1680.0340.0380.1680.0561.000

Missing values

2023-05-29T14:09:35.304147image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-29T14:09:35.741060image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

numero_avisomarcacodigo_irsnombreaccionaccion_modelomarca.1lineagruposubgrupotipo_carroceriaanio_range
044023HYUNDAI03060003nndncomputadoracambiarcambiarHYUNDAIaccent36sedan2012 - 2009
144023HYUNDAI03030005nndncable ground de bateriacambiarcambiarHYUNDAIaccent33sedan2012 - 2009
217203NISSAN02090225nidnpollera delantero izquierdocambiarcambiarNISSANFRONTIER29pickup2016 - 2013
337133KIA03040016nidnlampara delantero izquierdocambiarrepararKIArio34sedan2020 - 2017
417434HONDA02090021nddnbisagra derecho tapa motorcambiarcambiarHONDACRV29camioneta2016 - 2013
517434HONDA02090026nidnbisagra tapa motor izquierdocambiarcambiarHONDACRV29camioneta2016 - 2013
617434HONDA02080020nndnmoldura parrillacambiarcambiarHONDACRV28camioneta2016 - 2013
737133KIA02090112nndndefensa delanterorepararrepararKIArio29sedan2020 - 2017
837133KIA02090067nidnguardafango izquierdorepararrepararKIArio29sedan2020 - 2017
953420NISSAN02090112nndndefensa delanterorepararrepararNISSANXTRAIL29camioneta2020 - 2017
numero_avisomarcacodigo_irsnombreaccionaccion_modelomarca.1lineagruposubgrupotipo_carroceriaanio_range
10302267217HYUNDAI05040060nnnipechera central de motorcambiarcambiarHYUNDAITUCSON54camioneta2024 - 2021
10302367217HYUNDAI02080015nddnfleer guardafango derechocambiarcambiarHYUNDAITUCSON28camioneta2024 - 2021
10302467217HYUNDAI02080016nidnfleer guardafango izquierdocambiarcambiarHYUNDAITUCSON28camioneta2024 - 2021
10302567217HYUNDAI02090584nndntapa de gancho de remolque defensa delanterocambiarcambiarHYUNDAITUCSON29camioneta2024 - 2021
10302667217HYUNDAI02090214nndnporta placa delanterocambiarcambiarHYUNDAITUCSON29camioneta2024 - 2021
10302767217HYUNDAI05050023nndndeflector plastico marco frontalcambiarcambiarHYUNDAITUCSON55camioneta2024 - 2021
10302867229TOYOTA02090076nitncubre polvo plastico trasero izquierdocambiarcambiarTOYOTAHILUX29pickup2024 - 2021
10302967229TOYOTA02090227nitnpollera trasero izquierdocambiarrepararTOYOTAHILUX29pickup2024 - 2021
10303067229TOYOTA05070010nnnieje de mandocambiarcambiarTOYOTAHILUX57pickup2024 - 2021
10303167229TOYOTA02090150nitnpunta chasis trasero izquierdorepararrepararTOYOTAHILUX29pickup2024 - 2021

Duplicate rows

Most frequently occurring

numero_avisomarcacodigo_irsnombreaccionaccion_modelomarca.1lineagruposubgrupotipo_carroceriaanio_range# duplicates
4652862KIA03040016nidnlampara delantero izquierdocambiarcambiarKIARIO SEDAN34sedan2020 - 20173
9558913TOYOTA05060002nidnamortiguador delantero izquierdocambiarcambiarTOYOTARUSH56camioneta2020 - 20173
013607TOYOTA02090200nndntapa motorrepararrepararTOYOTAHILUX29pickup2020 - 20172
119085NISSAN02110041nndnvidrio parabrisas delanterocambiarcambiarNISSANsentra sedan211sedan2004 - atrás2
228272NISSAN02090551ninnbase manigueta delantero izquierdocambiarcambiarNISSANNAVARA PICK UP29pickup2016 - 20132
342426HYUNDAI02090156nntnrefuerzo central defensa traserocambiarcambiarHYUNDAItucson29camioneta2024 - 20212
443909HONDA03070015nddnsensor de impacto airbag o bolsa derechocambiarcambiarHONDAPILOT37camioneta2016 - 20132
544151KIA05050021nddndeflector de aire radiador derechocambiarcambiarKIASPORTAGE55camioneta2020 - 20172
644248HYUNDAI02090214nndnporta placa delanterocambiarcambiarHYUNDAITUCSON29camioneta2020 - 20172
744248HYUNDAI03040030ndtnluz placa trasero derechocambiarcambiarHYUNDAITUCSON34camioneta2020 - 20172